Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(elastic-search): improved default search #3284

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

martijnvdbrug
Copy link
Collaborator

@martijnvdbrug martijnvdbrug commented Dec 20, 2024

Description

Minor tweaks to improve the out-of-the-box search results from elastic search.

It was a bit demotivating to see that my search results were worse than with the default plugin, while ES is such a powerful engine. In my case this was due to:

  1. Description being just as important as name fields
  2. No type tolerance (fuzziness)

Most consumers probably define their own queries, but for those starting with the defaults this gives them a better experience.

Breaking changes

No

Checklist

📌 Always:

  • I have set a clear title
  • My PR is small and contains a single feature
  • I have checked my own PR

👍 Most of the time:

  • I have added or updated test cases
  • I have updated the README if needed

Copy link

vercel bot commented Dec 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
docs ✅ Ready (Inspect) Visit Preview Dec 24, 2024 11:10am

@martijnvdbrug
Copy link
Collaborator Author

@monrostar I know you guys have done a lot of work on this plugin, so perhaps you can take a look if this doesn't conflict with any of your use cases?

{ productId: 'T_3', enabled: false },
]);
const t3 = result.search.items.find(i => i.productId === 'T_3');
expect(t3?.enabled).toEqual(false);
Copy link
Collaborator Author

@martijnvdbrug martijnvdbrug Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fuzzy matching returns multiple results now, but this test only cares about if T3 is disabled, so we should ignore the other results

'Camera Lens',
'Instant Camera',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Camera Lens is now the first result because name is more important. In most cases this is desired, but this test case is debatable... WDYT?

@monrostar
Copy link
Contributor

monrostar commented Dec 25, 2024

@monrostar I know you guys have done a lot of work on this plugin, so perhaps you can take a look if this doesn't conflict with any of your use cases?

Hi, sorry for long reply. You can do what you want. Currently we are using our own plugin for Elasticsearch. We use one document for 1 variant at a time for all channels and all translations and currencies. Unfortunately I had to completely rewrite the original plugin. I'd like to make a contribute of this code, but we don't have plans for that yet...

Here's a small example of the new structure

const defaultAvailableLanguages = [LanguageCode.en]

const languageAnalyzerMap: Partial<Record<LanguageCode, string>> & { default: string } = {
  [LanguageCode.ar]: 'arabic',
  [LanguageCode.hy]: 'armenian',
  [LanguageCode.eu]: 'basque',
  [LanguageCode.bn]: 'bengali',
  [LanguageCode.pt_BR]: 'brazilian',
  [LanguageCode.bg]: 'bulgarian',
  [LanguageCode.ca]: 'catalan',
  [LanguageCode.cs]: 'czech',
  [LanguageCode.da]: 'danish',
  [LanguageCode.nl]: 'dutch',
  [LanguageCode.en]: 'english',
  [LanguageCode.en_AU]: 'english',
  [LanguageCode.en_CA]: 'english',
  [LanguageCode.en_GB]: 'english',
  [LanguageCode.en_US]: 'english',
  [LanguageCode.et]: 'estonian',
  [LanguageCode.fi]: 'finnish',
  [LanguageCode.fr]: 'french',
  [LanguageCode.gl]: 'galician',
  [LanguageCode.de]: 'german',
  [LanguageCode.el]: 'greek',
  [LanguageCode.hu]: 'hungarian',
  [LanguageCode.id]: 'indonesian',
  [LanguageCode.ga]: 'irish',
  [LanguageCode.it]: 'italian',
  [LanguageCode.lv]: 'latvian',
  [LanguageCode.lt]: 'lithuanian',
  [LanguageCode.nb]: 'norwegian',
  [LanguageCode.nn]: 'norwegian',
  [LanguageCode.pt]: 'portuguese',
  [LanguageCode.ro]: 'romanian',
  [LanguageCode.ru]: 'russian',
  [LanguageCode.sr]: 'serbian',
  [LanguageCode.es]: 'spanish',
  [LanguageCode.sv]: 'swedish',
  default: 'standard',
}

function getAnalyzerForLanguage(languageCode: LanguageCode): string {
  return languageAnalyzerMap[languageCode] || languageAnalyzerMap.default
}

export const buildIndexName = (prefix: string, name: string, postfix = ''): estypes.IndexName => `${prefix}${name}${postfix}`
export const buildAliasName = (prefix: string, name: string, postfix = ''): estypes.IndexAlias => `${prefix}${name}${postfix}`

export function TranslatedTextKeywordMappingField(): estypes.MappingObjectProperty {
  return {
    type: 'object',
    properties: defaultAvailableLanguages.reduce((acc, lang) => {
      acc[lang] = {
        type: 'text',
        analyzer: `${getAnalyzerForLanguage(lang)}_analyzer`,
        fields: {
          keyword: {
            type: 'keyword',
          },
        },
      }
      return acc
    }, {} as Record<LanguageCode, estypes.MappingProperty>),
  }
}

export function TranslatedTextMappingField(): estypes.MappingObjectProperty {
  return {
    type: 'object',
    properties: defaultAvailableLanguages.reduce((acc, lang) => {
      acc[lang] = {
        type: 'text',
        analyzer: `${getAnalyzerForLanguage(lang)}_analyzer`,
      }
      return acc
    }, {} as Record<LanguageCode, estypes.MappingProperty>),
  }
}

const priceMappingField: estypes.MappingProperty = {
  type: 'nested',
  properties: {
    id: { type: 'keyword' },
    channelId: { type: 'keyword' },
    currencyCode: { type: 'keyword' },
    price: { type: 'integer' },
  },
}

function generateDynamicTemplatesAndAnalyzers() {
  const dynamicTemplates: Record<string, MappingDynamicTemplate> | Record<string, MappingDynamicTemplate>[] = []
  const analyzers: Record<string, AnalysisAnalyzer> = {
    standard_analyzer: {
      type: 'custom',
      tokenizer: 'standard',
      filter: ['lowercase', 'asciifolding'],
    },
  }
  const filters: Record<string, AnalysisTokenFilter> = {}

  for (const langCode of Object.values(LanguageCode)) {
    const analyzerName = getAnalyzerForLanguage(langCode)
    const effectiveAnalyzer = analyzerName ? `${analyzerName}_analyzer` : 'standard_analyzer'

    dynamicTemplates.push({
      [`language_analyzer_${langCode}_productName`]: {
        match_mapping_type: 'string',
        path_match: `productName.${langCode}`,
        mapping: {
          type: 'text',
          analyzer: effectiveAnalyzer,
          fields: {
            keyword: {
              type: 'keyword',
              ignore_above: 256,
            },
          },
        },
      },
    })

    dynamicTemplates.push({
      [`language_analyzer_${langCode}_variantName`]: {
        match_mapping_type: 'string',
        path_match: `variantName.${langCode}`,
        mapping: {
          type: 'text',
          analyzer: effectiveAnalyzer,
          fields: {
            keyword: {
              type: 'keyword',
              ignore_above: 256,
            },
          },
        },
      },
    })

    dynamicTemplates.push({
      [`language_analyzer_${langCode}_productDescription`]: {
        match_mapping_type: 'string',
        path_match: `productDescription.${langCode}`,
        mapping: {
          type: 'text',
          analyzer: effectiveAnalyzer,
        },
      },
    })

    if (analyzerName && analyzerName !== 'standard') {
      analyzers[`${analyzerName}_analyzer`] = {
        type: 'custom',
        tokenizer: 'standard',
        filter: ['lowercase', 'asciifolding', `${analyzerName}_stemmer`],
      }
      filters[`${analyzerName}_stemmer`] = {
        type: 'stemmer',
        language: analyzerName,
      }
    }
  }

  return { dynamicTemplates, analyzers, filters }
}

const ProductVariantIndexMappingProperties: { [key in keyof VariantIndexItem]: estypes.MappingProperty } = {
  // index date
  lastSyncedAt: { type: 'date' },
  productUpdatedAt: { type: 'date' },
  productCreatedAt: { type: 'date' },
  // product fields
  productId: { type: 'keyword' },

  productChannelIds: { type: 'keyword' },
  productCollectionIds: { type: 'keyword' },
  productFacetValueIds: { type: 'keyword' },
  productFacetIds: { type: 'keyword' },

  productOptions: { type: 'flattened' },
  productOptionsGroups: {
    type: 'nested',
    properties: {
      code: { type: 'keyword' },
      id: { type: 'keyword' },
      name: TranslatedTextKeywordMappingField(),
      options: {
        type: 'nested',
        properties: {
          id: { type: 'keyword' },
          name: TranslatedTextKeywordMappingField(),
          code: { type: 'keyword' },
        },
      },
    },
  },
  productEnabled: { type: 'boolean' },
  productInStock: { type: 'boolean' },

  productName: TranslatedTextKeywordMappingField(),
  productSlug: TranslatedTextKeywordMappingField(),
  productDescription: TranslatedTextMappingField(),

  productPriceMax: priceMappingField,
  productPriceMin: priceMappingField,

  productAssetId: { type: 'keyword' },
  productPreview: { type: 'keyword' },
  productPreviewFocalPoint: { type: 'flattened' },
  productAssets: { type: 'flattened' },

  // variant fields
  variantUpdatedAt: { type: 'date' },
  variantCreatedAt: { type: 'date' },
  variantId: { type: 'keyword' },

  variantChannelIds: { type: 'keyword' },
  variantCollectionIds: { type: 'keyword' },
  variantFacetIds: { type: 'keyword' },
  variantFacetValueIds: { type: 'keyword' },

  variantEnabled: { type: 'boolean' },
  variantInStock: { type: 'boolean' },
  variantDisplayStockLevel: { type: 'keyword' },

  variantName: TranslatedTextKeywordMappingField(),
  variantSku: { type: 'keyword' },

  variantOptions: {
    type: 'nested',
    properties: {
      code: { type: 'keyword' },
      id: { type: 'keyword' },
      name: TranslatedTextKeywordMappingField(),
      group: {
        type: 'object',
        properties: {
          id: { type: 'keyword' },
          name: TranslatedTextKeywordMappingField(),
          code: { type: 'keyword' },
        },
      },
    },
  },

  variantPrice: priceMappingField,

  variantAssetId: { type: 'keyword' },
  variantPreview: { type: 'keyword' },
  variantPreviewFocalPoint: { type: 'flattened' },
  variantAssets: { type: 'flattened' },
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📦 Backlog
Development

Successfully merging this pull request may close these issues.

3 participants